Test speedwagon tool by ljhh-0611 · Pull Request #48 · brekkylab/agent-k

ljhh-0611 · 2026-04-28T18:15:21Z

Summary

Add POST /sessions/{id}/messages endpoint that streams agent responses and returns collected messages as JSON (with 120s timeout)
Integrate Speedwagon as a subagent — main agent (GPT-5.4-mini) delegates RAG queries to speedwagon subagent via tool call, with instruction forcing document corpus search before answering factual questions
Migrate SharedStore from Mutex to RwLock for concurrent read access in tool operations (search/find/read use store.read().await)
Replace Arc<Mutex<AppState>> with Arc<AppState> using DashMap for interior mutability — eliminates unnecessary outer lock
Add lib.rs to expose modules for integration test access
Add ApiError/ApiResult type aliases and AppError::internal()/not_found() helpers

Changed Files

Area	Files	What
backend-v2	`router.rs`	Message handler, Speedwagon store/toolset singletons (`OnceLock`), subagent session creation
backend-v2	`state.rs`	`HashMap` → `DashMap<Uuid, Arc<Mutex<Agent>>>`, sync API
backend-v2	`main.rs`	Remove outer `Mutex` on `AppState`
backend-v2	`error.rs`	`ApiError`, `ApiResult` types + helper constructors
backend-v2	`model/message.rs`	`SendMessageRequest`, `SendMessageResponse` DTOs
backend-v2	`lib.rs`	New — public module re-exports
backend-v2	`Cargo.toml`	Add `dashmap`, `tokio/time`, dev-deps for E2E
speedwagon	`store/mod.rs`	`SharedStore = Arc<RwLock<Store>>` type alias
speedwagon	`tool/*.rs`	`.lock().await` → `.read().await`
speedwagon	`main.rs`	Use `SharedStore` with `RwLock`
root	`.env.example`, `.gitignore`, `Cargo.toml`	Env template, ignore `.speedwagon/`, ailoy rev bump

Design History

1. SharedStore: `Arc<Store>` → `Arc<RwLock<Store>>`

Why not just Arc<Store>?

build_toolset(store) creates tool closures that capture a clone of the Arc. The same Arc is also held by router/test code that calls ingest()/purge(). Here's where it breaks:

// Tool closure captures Arc<Store>, calls &self methods — fine
let store = store.clone();
move |args| { store.search(&query, ...) }  // store: Arc<Store> → deref to &Store ✓

// But ingest/purge need &mut self — impossible from Arc<Store>
store.ingest(bytes, file_type)  // needs &mut Store, Arc only gives &Store ✗

Arc<T> can only provide &T (shared reference). There's no way to get &mut T from it — that's by design, since multiple clones of the same Arc could exist. So any code path that needs mutation requires interior mutability.

Why not Arc<Mutex<Store>>?

Mutex works, but every access — including concurrent tool reads — takes an exclusive lock. If agent A is searching while agent B tries to search, B blocks until A finishes. Tools are read-only (search, find, read all take &self), so serializing them is unnecessary.

Decision: Arc<RwLock<Store>> — the standard reader-writer solution.

Tool closures:  store.read().await   → multiple tools can search simultaneously
ingest/purge:   store.write().await  → exclusive access, blocks until all readers finish

Type alias pub type SharedStore = Arc<RwLock<Store>> is exported from the speedwagon crate so backend-v2, CLI, and tests all share the same type.

2. AppState: Remove Outer Mutex → DashMap Interior Mutability

Problem: Arc<Mutex<AppState>> locks the entire state for every handler call. Two concurrent requests to different sessions serialize on the same Mutex.

Decision: Replace HashMap with DashMap (sharded concurrent map). insert_agent and get_agent become &self methods — no .await, no outer lock. Main becomes Arc::new(AppState::new()).

Before: Arc<Mutex<AppState { HashMap<Uuid, Agent> }>>
After:  Arc<AppState { DashMap<Uuid, Arc<Mutex<Agent>>> }>

3. Why `Arc<Mutex<Agent>>` Inside DashMap

Three layers, each handling a different concurrency level:

Layer	What It Does
`DashMap`	Concurrent session lookup/insert across requests
`Arc`	Cloneable handle — decouples agent lifetime from DashMap entry
`Mutex<Agent>`	Per-agent exclusivity — `Agent::run(&mut self)` requires exclusive access while the stream is alive

Two messages to different sessions run fully in parallel. Two messages to the same session serialize at the Agent Mutex — correct behavior since agent maintains conversation state.

4. Why `lib.rs` Is Needed

Rust integration tests (tests/*.rs) are external to the crate. Without lib.rs, modules defined in main.rs are invisible to tests. Adding lib.rs with pub mod router; pub mod state; ... creates a library crate that both the binary and tests can import as agent_k_backend::*.

5. STORE / TOOLSET Singletons (OnceLock)

Problem: Creating a new Store per session means each agent gets a separate store — documents ingested in session A are invisible to session B.

Decision: OnceLock<SharedStore> and OnceLock<ToolSet> — process-wide singletons, initialized lazily on first call. The ToolSet captures the Store via closures, so they must reference the same instance.

⚠️ Development-only design. Production would need:

Current Limitation	Production Requirement
Store path hardcoded to `CARGO_MANIFEST_DIR`	Configurable via env/config
Single Store for entire process	Per-knowledge-base stores (multi-tenant)
Tests share the same singleton	Test isolation (separate Store instances)

6. Agent Creation: Direct vs Subagent Pattern

Two patterns exist in create_session — subagent is active, direct is commented out for easy switching.

Direct: Agent::try_with_tools(SpeedwagonSpec.into_spec(), provider, toolset) — agent owns search/find/read tools directly. SpeedwagonSpec's built-in SYSTEM_PROMPT instructs tool usage.

Subagent: AgentBuilder::new(model).subagent(card, sw_agent).build() — main agent has a single "speedwagon" tool that delegates to a separate agent. Allows main/sub to use different models.

E2E Test Behavior Comparison

	Direct	Subagent
Tools on agent	search, find, read, calculate	speedwagon (delegates internally)
Tool invocation reliability	High — SYSTEM_PROMPT directly instructs	Depends on instruction + card quality
Initial E2E result	✅ Pass (Glorkville in response)	❌ Fail — main agent skipped tool call
Failure cause	N/A	Vague card description + no instruction
After fix	N/A	Instruction forcing tool use + descriptive card → improved
Model flexibility	Locked to SpeedwagonSpec default	Main and sub can use different models
Complexity	Low (one line)	High (AgentBuilder + AgentCard + instruction)
Debuggability	Tool calls visible directly	Subagent internals are opaque

Subagent Failure → Fix Timeline

Initial: Card description was vague ("Speedwagon agent") → main agent didn't recognize the tool's purpose
Added instruction: "You MUST use the speedwagon tool..." → still failed
Improved card description: "Search the knowledge base for answers. This tool has access to uploaded documents..." → improved
Both instruction AND card description must be strong for reliable subagent invocation

Both patterns are intentionally kept in the code (active vs commented) because they have different trade-offs.

7. Why `send_message` API Was Added

create_session only creates a session with an agent. Without send_message, there's no way to talk to it. Current design is synchronous JSON (not SSE):

Client → POST /sessions/{id}/messages { content: "..." }
                    ↓
         agent.run(query) — consume entire stream
         collect all messages + extract final assistant text
                    ↓
Client ← 200 { messages: [...], final_content: "..." }

Decision	Reason
120s timeout	Subagent round-trip (main → sub → main) can be slow
`final_content` field	Convenience — client gets last assistant text without parsing messages array
No SSE	Goal at this stage is E2E validation. SSE already implemented in backend v1, will be ported later
Return full `messages` array	Includes tool_call/tool_result intermediate messages — useful for debugging and UI

E2E Test

test_ingest_message_purge_cycle (#[ignore], requires OPENAI_API_KEY):

Ingest a document with unique fact ("The capital of Freedonia is Glorkville")
Create session → send message asking about the fact
Assert response contains "Glorkville" (verifies subagent RAG pipeline)
Purge document → send same message → assert non-empty response

cargo test -p agent-k-backend test_ingest_message_purge_cycle -- --ignored

🤖 Generated with Claude Code

Co-authored-by: Copilot <copilot@github.com>

Integrate base branch features (SQLite repository, per-session sandbox, SSE streaming, message persistence, session CRUD) with speedwagon RAG subagent, DashMap concurrency, and ApiError/ApiResult error patterns. Key integration points: - build_agent() combines builtin tools (bash/python/web_search), per-session sandbox, and speedwagon subagent - DashMap-based AppState with injected SharedStore/ToolSet - resolve_agent() lazy-creates agents with full history restoration - All tests updated for new AppState::new(repo, store, toolset) signature Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-authored-by: Copilot <copilot@github.com>

khj809 · 2026-05-04T12:44:19Z

@jhlee525 Please review my latest commit, which applies the changes in brekkylab/ailoy#391. I'm not sure if this follows your expected practices exactly.

jhlee525 · 2026-05-06T05:11:51Z

I think it align with current design! Thank you for changes.

- Remove unused `into_runtime` and `into_runtime_with_provider` from SpeedwagonSpec (never called anywhere) - Migrate e2e_test.rs to use shared `common` module helpers instead of local `json_request` and `extract_assistant_text` duplicates - Fix `.sandbox()` → `.runenv()` in commented-out alternative build_agent Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: rename knowledge-agent crate to speedwagon and restructure modules Renames the crate and reorganizes source into store (indexer, parser, searcher, translator) and tool modules. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * add functions * update tools * add global tool function * add cli & tests * add get_many & ingest_many functions * update internal features * apply latest ailoy, initialize backend v2 * update ailoy the one ToolFactory is Send + Sync * find: line-based matching with bare-word fallback and query syntax (#40) Match line-by-line internally and report byte offsets externally. Bare-word queries get progressive AND -> HALF -> OR fallback; the level used is surfaced in the response so callers can tell strict matches from relaxed ones. Replace the single case-insensitive regex query with a small structured-query parser supporting "phrase", +term/-term, AND/OR/NOT, (group), and /regex/. Any explicit operator opts out of fallback so caller intent is preserved. Co-authored-by: nuri-yoo <nuri-yoo@users.noreply.github.com> * Add purpose metadata field to Store ingest (#41) Add LLM-generated `purpose` as a third indexed field alongside `title` and `content`. The prompt and preview strategy match the v1 variant from the metadata ablation report (first 3000 chars, "what queries should find this document?" framing), which improved BM25 hit@5 by +15.3pp on FinanceBench over the title-only baseline. - `Document` gains a `purpose: String` field. - `parser::get_purpose()` calls a new `PurposeAgent` (gpt-5.4-mini, JSON response parsed via `serde_json`). Empty/invalid responses fall back to an empty string with a warn log so ingest still succeeds; transport-level errors propagate. - Tantivy schema adds `purpose` as `TEXT|STORED`. `open_or_create` detects schema mismatch on existing indexes and rebuilds the index directory automatically (corpus is preserved, so re-ingest only pays the LLM cost). - `Store::ingest` and `Store::ingest_many` invoke `get_title` and `get_purpose` in parallel via `tokio::try_join!`. - `QueryParser` default fields are extended to `[title, purpose, content]` so purpose terms participate in BM25 scoring. Closes #37 Co-authored-by: nuri-yoo <nuri-yoo@users.noreply.github.com> * add basic apis to test sandbox and messaging * add keep-alive on SSE, factor out build_agent * add GET/DELETE messages API * use latest of ailoy PR#391 * update ailoy * Add HTML support for Store (#46) * feat: add HTML file type support for `Store` * refactor(translator): split translator into files based on FileType * refactor(speedwagon): centralize filetype mapping and translator dispatch * feat(speedwagon-cli): apply shared filetype mapping to ingest * fix: apply chrome-stripping universally instead of site-specific branch * fix: use single-quoted string to avoid complex escapes * fix: adjust condition to prepend title and * fix: add more chrome types * refactor: use `html_to_markdown_rs`, not `dom_smoothie` and `dom_query` * feat(speedwagon): add DescriptionAgent for KB-level description generation (#47) * feat(speedwagon): add DescriptionAgent for KB-level description generation * refactor(speedwagon): self-anchor description prompt and trim doc-comments - Drop the "alongside other KBs" framing from the prompt opening so the model isn't primed to emit comparison vocabulary; the existing "do not compare" clause now matches the framing. - Note Korean output's ~1/3 char density at the same budget; per-language budgets are deferred (LLM can't count Korean words reliably either). - Trim verbose doc-comments across description.rs and Store::describe; add cost note (~24K input chars at N=200) to discourage synchronous use on indexing hot paths. Verified against the 4-KB / 12-probe harness: routing accuracy stays 12/12 across the shipped baseline, the prompt-only change, and word-budget variants. cargo test -p speedwagon --lib description: 12 passed, 1 ignored. * feat(speedwagon): force description output to English * refactor(speedwagon): borrow doc slices in description path Switch `&[(String, String)]` / `&[String]` to `&[(&str, &str)]` / `&[&str]` in `generate`, `get_description`, `build_user_message`, and `fallback_description`, and drop the upfront title/purpose clone in `Store::describe`. Caller-side `Document` strings are borrowed directly, and the fallback title vec is built only on the empty-LLM-response branch. --------- Co-authored-by: nuri-yoo <nuri-yoo@users.noreply.github.com> * Integrating helper agents into a common interface (#50) * refactor(speedwagon): use ailoy default_provider for LLM helpers * refactor(speedwagon): consolidate LLM helpers under HelperAgent trait * refactor(speedwagon): tighten HelperAgent contract and response handling * refactor(speedwagon): pick helper model from preference list, fall back when none registered * Test speedwagon tool (#48) * UPDATE : create session with speedwagon * ADD : Session message & e2e rough test Co-authored-by: Copilot <copilot@github.com> * chore : test enhance & making agent change graceful Co-authored-by: Copilot <copilot@github.com> * UPDATE : instructions of swcard/main-agent * ADD : commented build_agent with direct speedwagon tool Co-authored-by: Copilot <copilot@github.com> * remove : duplicated dep in dev * update ailoy * align with latest ailoy develop * refactor: remove dead code and deduplicate e2e test helpers - Remove unused `into_runtime` and `into_runtime_with_provider` from SpeedwagonSpec (never called anywhere) - Migrate e2e_test.rs to use shared `common` module helpers instead of local `json_request` and `extract_assistant_text` duplicates - Fix `.sandbox()` → `.runenv()` in commented-out alternative build_agent Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: khj809 <onsealeatang@gmail.com> * chore(deps): bump ailoy to post-#391 (AgentBuilder + sandbox sharing) (#51) Bumps ailoy from c42231e1 (2026-04-21) to 098a8289 (PR #391, 2026-05-04). This pulls in 27 commits worth of breaking changes; this PR only patches speedwagon to compile and pass tests against the new API. AgentBuilder itself is not adopted yet — that's PR #52. Breaking changes absorbed - ailoy#389: ToolFactory introduced; tool sources now live on ToolProvider.custom(ToolFactory::simple(desc, func)) rather than ToolSet.insert(name, desc, func). - ailoy#390: RunEnv trait + sandbox feature split. AgentState carries an Arc<dyn RunEnv>, defaulting to Local. No direct touch in this PR; speedwagon never constructed sandboxes. - ailoy#391: Provider registry overhaul. AgentProvider.models is now a LangModelProvider with .insert(pattern, LangModelProviderElem::API{..}) / .get(name) glob-matching. The old default_provider_mut().model_openai() / model_claude() / model_gemini() helper constructors were removed, along with provider.get_model(...). Agent::try_with_tools(spec, provider, toolset) is gone — tools must live on provider.tools. Speedwagon changes - tool/mod.rs: build_toolset(store) -> ToolSet renamed to build_tool_provider(store) -> ToolProvider, using ToolFactory::simple. - main.rs::build_agent: no separate toolset arg; clones the global provider, overrides provider.tools with the store-bound tools, Agent::try_with_provider(spec, &provider). - store/helper.rs:88: provider.get_model(m) -> provider.models.get(m). Same semantics, new accessor location. - New module speedwagon::provider with register_provider_from_env that reads OPENAI_API_KEY / ANTHROPIC_API_KEY / GEMINI_API_KEY and registers the same glob patterns the removed ailoy helpers used (openai/*, anthropic/claude-*, google/gemini-*). main.rs and the description.rs integration test both call this — keeps the env-key → glob-pattern mapping in one place, preserving PR #49's invariant that helper modules never read env directly. Out of scope - chat-agent / backend / knowledge-agent: these were already broken on the refactoring-applied baseline (workspace member vs. standalone ambiguity). They will be brought back in a separate PR alongside the AgentBuilder migration (planned PR #52). Verification cargo check -p speedwagon --tests --all-features # clean cargo test -p speedwagon --lib --all-features # 71 passed; 0 failed; 2 ignored Co-authored-by: nuri-yoo <nuri-yoo@users.noreply.github.com> * feat: add auth/user APIs (#54) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * ADD: batch document ingest + bulk purge with partial success (#57) * feat: batch document ingest + bulk purge with partial success - Add `Store::ingest_many` partial success (IngestResult/IngestFailure) with batch index optimization and best-effort cleanup on failure - Add `Store::purge_many` (PurgeResult/PurgeFailure) - POST /documents: multi-file multipart upload with per-file validation - DELETE /documents: bulk purge via JSON body { ids: [...] } - GET /documents/{id}: single document retrieval - Response DTOs: BatchIngestResponse, BatchPurgeResponse, FailedItem - 14 new document tests + e2e test rewritten for multi-doc HTTP flow Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Preserve batch-ingest failure evidence while reducing helper duplication Keep the document batch API behavior unchanged while removing a silent fallback in failure indexing and trimming duplicated test HTTP setup. Constraint: Cleanup is scoped to PR #57 / a69222c document-batch changes only.\nRejected: Rewriting broader Speedwagon parser/tool clippy warnings | outside the requested commit scope.\nConfidence: high\nScope-risk: narrow\nDirective: Keep ingest_many response semantics provisional until the Store contract is hardened.\nTested: cargo fmt --check -p agent-k-backend -p speedwagon; cargo check -p agent-k-backend; cargo test -p agent-k-backend --test document_test; cargo test -p speedwagon --no-default-features --lib; cargo clippy -p agent-k-backend --tests\nNot-tested: live ignored e2e RAG test requiring OPENAI_API_KEY * use aide::axum::routing * Keep document batch failures item-scoped Review feedback showed batch ingest and purge paths could hide item-level failures or over-clean existing artifacts. This keeps API ids string-shaped at the boundary while parsing per item before store operations. Constraint: PR #57 review requested String id consistency and explicit multipart/corpus failure handling Rejected: Converting speedwagon document ids to Uuid | would broaden index/tool/CLI scope beyond PR Confidence: high Scope-risk: narrow Directive: Keep speedwagon index/tool IDs string-shaped unless a broader migration is planned Tested: cargo fmt --check -p speedwagon -p agent-k-backend; cargo test -p speedwagon --lib; cargo test -p agent-k-backend --test document_test; cargo check -p speedwagon -p agent-k-backend; git diff --check Not-tested: clippy -D warnings; blocked by preexisting warnings outside this change --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: khj809 <onsealeatang@gmail.com> * update ailoy * feat: add Project workspace and session sharing model (#62) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: jhlee525 <bmrcreative90@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: nuri <yoonuri1@gmail.com> Co-authored-by: nuri-yoo <nuri-yoo@users.noreply.github.com> Co-authored-by: Park Woorak <wrpark@brekkylab.com> Co-authored-by: JaehunLee <ljhh0611@gmail.com> Co-authored-by: Copilot <copilot@github.com>

ljhh-0611 and others added 5 commits April 29, 2026 00:12

UPDATE : create session with speedwagon

1ac81ab

ADD : Session message & e2e rough test

7eeb858

Co-authored-by: Copilot <copilot@github.com>

chore : test enhance & making agent change graceful

5663974

Co-authored-by: Copilot <copilot@github.com>

UPDATE : instructions of swcard/main-agent

6b4dd0d

ljhh-0611 requested review from jhlee525 and khj809 and removed request for khj809 April 29, 2026 06:24

ljhh-0611 self-assigned this Apr 29, 2026

ADD : commented build_agent with direct speedwagon tool

920586a

Co-authored-by: Copilot <copilot@github.com>

khj809 reviewed Apr 29, 2026

View reviewed changes

Comment thread backend-v2/Cargo.toml Outdated

khj809 and others added 3 commits April 29, 2026 19:24

resolve conflicts

72f96f2

remove : duplicated dep in dev

4e43a1f

update ailoy

2992655

khj809 reviewed Apr 29, 2026

View reviewed changes

Comment thread backend-v2/src/router.rs Outdated

Comment thread backend-v2/src/router.rs Outdated

ljhh-0611 marked this pull request as ready for review April 30, 2026 04:08

ljhh-0611 marked this pull request as draft April 30, 2026 04:31

align with latest ailoy develop

3b9ea6d

ljhh-0611 marked this pull request as ready for review May 6, 2026 07:35

khj809 approved these changes May 6, 2026

View reviewed changes

ljhh-0611 merged commit fa99b3c into refactoring-applied-backed-v2 May 6, 2026

ljhh-0611 deleted the add-speedwagon-tool branch May 6, 2026 11:02

ljhh-0611 mentioned this pull request May 7, 2026

ADD: batch document ingest + bulk purge with partial success #57

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test speedwagon tool#48

Test speedwagon tool#48
ljhh-0611 merged 11 commits into
refactoring-applied-backed-v2from
add-speedwagon-tool

ljhh-0611 commented Apr 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

khj809 commented May 4, 2026

Uh oh!

jhlee525 commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ljhh-0611 commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changed Files

Design History

1. SharedStore: Arc<Store> → Arc<RwLock<Store>>

2. AppState: Remove Outer Mutex → DashMap Interior Mutability

3. Why Arc<Mutex<Agent>> Inside DashMap

4. Why lib.rs Is Needed

5. STORE / TOOLSET Singletons (OnceLock)

6. Agent Creation: Direct vs Subagent Pattern

E2E Test Behavior Comparison

Subagent Failure → Fix Timeline

7. Why send_message API Was Added

E2E Test

Uh oh!

Uh oh!

Uh oh!

Uh oh!

khj809 commented May 4, 2026

Uh oh!

jhlee525 commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ljhh-0611 commented Apr 28, 2026 •

edited

Loading

1. SharedStore: `Arc<Store>` → `Arc<RwLock<Store>>`

3. Why `Arc<Mutex<Agent>>` Inside DashMap

4. Why `lib.rs` Is Needed

7. Why `send_message` API Was Added